bernstein condition
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Nevada (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
Coresets for Clustering Under Stochastic Noise
Huang, Lingxiao, Li, Zhize, Vishnoi, Nisheeth K., Yang, Runkai, Zhao, Haoyu
We study the problem of constructing coresets for $(k, z)$-clustering when the input dataset is corrupted by stochastic noise drawn from a known distribution. In this setting, evaluating the quality of a coreset is inherently challenging, as the true underlying dataset is unobserved. To address this, we investigate coreset construction using surrogate error metrics that are tractable and provably related to the true clustering cost. We analyze a traditional metric from prior work and introduce a new error metric that more closely aligns with the true cost. Although our metric is defined independently of the noise distribution, it enables approximation guarantees that scale with the noise level. We design a coreset construction algorithm based on this metric and show that, under mild assumptions on the data and noise, enforcing an $\varepsilon$-bound under our metric yields smaller coresets and tighter guarantees on the true clustering cost than those obtained via classical metrics. In particular, we prove that the coreset size can improve by a factor of up to $\mathrm{poly}(k)$, where $n$ is the dataset size. Experiments on real-world datasets support our theoretical findings and demonstrate the practical advantages of our approach.
- Europe > Austria > Vienna (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Singapore (0.04)
- (17 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
- Information Technology > Security & Privacy (0.67)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
From Stochastic Mixability to Fast Rates
Nishant A. Mehta, Robert C. Williamson
Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss null. In the parametric setting, depending upon (null, F, P) ERM can have slow (1 / n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n . There exist several results that give sufficient conditions for fast rates in terms of joint properties of null, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss null (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of ( null, F, P), and in so doing provides new insight into the fast-rates phenomenon.
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)